Identifying gene-Specific Variations in Biomedical Text
نویسندگان
چکیده
The influence of genetic variations on diseases or cellular processes is the main focus of many investigations, and results of biomedical studies are often only accessible through scientific publications. Automatic extraction of this information requires recognition of the gene names and the accompanying allelic variant information. In a previous work, the OSIRIS system for the detection of allelic variation in text based on a query expansion approach was communicated. Challenges associated with this system are the relatively low recall for variation mentions and gene name recognition. To tackle this challenge, we integrate the ProMiner system developed for the recognition and normalization of gene and protein names with a conditional random field (CRF)-based recognition of variation terms in biomedical text. Following the newly developed normalization of variation entities, we can link textual entities to Single Nucleotide Polymorphism database (dbSNP) entries. The performance of this novel approach is evaluated, and improved results in comparison to state-of-the-art systems are reported.
منابع مشابه
The GNAT library for local and remote gene mention normalization
SUMMARY Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the Gnat Java library for text retrieval, named enti...
متن کاملDifferent approaches for identifying important concepts in probabilistic biomedical text summarization
Automatic text summarization tools help users in the biomedical domain to acquire their intended information from various textual resources more efficiently. Some of biomedical text summarization systems put the basis of their sentence selection approach on the frequency of concepts extracted from the input text. However, it seems that exploring other measures rather than the raw frequency for ...
متن کاملCultural Elements in the Translation of Children's Literature: Persian translation of Roald Dahl’s Matilda in focus
Translation can have long-term effects on all languages and cultures. It is not a mere linguistic act, but mostly a cultural act, since language is by nature one of the major carriers of cultural elements. Thus, the translator’s job is not just transferring the meaning of words and sentences from the source text to the target text. Culture-specific items often cause translation problems. Identi...
متن کاملIntegrating text mining into the MGI biocuration workflow
A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural languag...
متن کاملAutomatic discourse connective detection in biomedical text
OBJECTIVE Relation extraction in biomedical text mining systems has largely focused on identifying clause-level relations, but increasing sophistication demands the recognition of relations at discourse level. A first step in identifying discourse relations involves the detection of discourse connectives: words or phrases used in text to express discourse relations. In this study supervised mac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of bioinformatics and computational biology
دوره 5 6 شماره
صفحات -
تاریخ انتشار 2007